Skip to main content

Pandas 2.x Migration Guide for Energyworx

This guide helps you migrate your custom Flow rules and Market Adapters from Pandas 1.x to Pandas 2.x. It covers every breaking change relevant to Energyworx code, with before/after examples and search patterns to find affected code.

Cross-Version Compatibility Required

The Energyworx platform is transitioning from Pandas 1.x to 2.x. During this period, your code must work with both Pandas 1.x and 2.x. This guide marks each change with its compatibility:

  • Safe now: The new syntax works in both Pandas 1.x and 2.x (e.g., "h" instead of "H", pd.concat() instead of .append())
  • 2.2+ only: The new syntax only works in Pandas 2.2 or later (e.g., "ME" instead of "M"). Do not use these yet — keep the old syntax until the platform fully migrates.

Table of Contents

  1. Quick Reference: What Changed
  2. Removed Methods and Parameters
  3. Frequency Alias Changes
  4. Timezone and Datetime Changes
  5. DataFrame Operation Changes
  6. Timedelta Handling
  7. Migrating Flow Rules
  8. Migrating Market Adapters
  9. Error Message Reference
  10. Search Patterns for Your Code
  11. Testing Your Migration

1. Quick Reference: What Changed

CategorySeverityCompatSummary
DataFrame.append() removedErrorSafe nowUse pd.concat() instead
pd.date_range(closed=) removedErrorSafe nowUse inclusive= parameter
Frequency aliases ("H", "T", "S")WarningSafe nowLowercase required: "h", "min", "s"
Frequency aliases ("M", "Q", "Y")Warning2.2+ onlyNew suffixes "ME", "QE", "YE"keep old syntax for now
pd.Timedelta("100y") / pd.Timedelta("6M")ErrorSafe nowUse days or DateOffset
Index.get_loc(method=) removedErrorSafe nowUse get_indexer() instead
.ix[] accessor removedErrorSafe nowUse .loc[] or .iloc[]
ExcelWriter.save() removedErrorSafe nowUse .close()
is_monotonic removedErrorSafe nowUse is_monotonic_increasing
infer_datetime_format parameter removedErrorSafe nowRemove the parameter
pd.np removed (e.g., pd.np.nan)ErrorSafe nowUse np.nan with import numpy as np
pd.to_datetime() stricter format inferenceSilentSee notesUse fallback chain for cross-version compat
value_counts().reset_index() columns renamedSilentSafe nowUse positional column access
groupby([col]) key type changedSilentSafe nowRemove list wrapper for single column
.columns & list deprecatedWarningSafe nowUse .intersection()
Mixed-type DataFrame operations stricterErrorSafe nowSelect numeric columns first
Timezone-naive/aware mixingErrorSafe nowAlways localize timestamps
inplace=True deprecatedWarningSafe nowUse assignment instead
Copy-on-Write behaviorSilentSafe nowAvoid chained indexing

2. Removed Methods and Parameters

2.1 DataFrame.append() and Series.append() Removed

The .append() method has been removed from DataFrames, Series, and Index objects. Use pd.concat() instead.

# BEFORE (Pandas 1.x)
df = df.append(other_df)
df = df.append(other_df, ignore_index=True)
series = series.append(other_series)

# AFTER (Pandas 2.x)
df = pd.concat([df, other_df])
df = pd.concat([df, other_df], ignore_index=True)
series = pd.concat([series, other_series])

Notes:

  • pd.concat() returns a new object — it does not modify in place.
  • Always wrap the objects in a list: [df1, df2].
  • For appending a single row as a dict, use pd.concat([df, pd.DataFrame([row_dict])]).

Search pattern: \.append\( (then verify it's on a DataFrame/Series, not a Python list)


2.2 pd.date_range(closed=) Removed

The closed parameter in pd.date_range() has been replaced with inclusive.

note

This change only applies to pd.date_range(). Other methods like IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), and pd.cut() still use the closed parameter in Pandas 2.x.

# BEFORE (Pandas 1.x)
pd.date_range(start, end, freq="h", closed="right")
pd.date_range(start, end, freq="h", closed="left")
pd.date_range(start, end, freq="h", closed=None)

# AFTER (Pandas 2.x)
pd.date_range(start, end, freq="h", inclusive="right")
pd.date_range(start, end, freq="h", inclusive="left")
pd.date_range(start, end, freq="h", inclusive="both")

Mapping:

Old (closed=)New (inclusive=)Meaning
closed=Noneinclusive="both"Include both start and end
closed="left"inclusive="left"Include start, exclude end
closed="right"inclusive="right"Exclude start, include end

Search pattern: date_range\([^)]*closed\s*=


2.3 Index.get_loc(method=) Removed

The method parameter has been removed from Index.get_loc(). Use Index.get_indexer() instead.

# BEFORE (Pandas 1.x)
idx = df.index.get_loc(date, method="nearest")

# AFTER (Pandas 2.x)
idx = df.index.get_indexer([date], method="nearest")[0]

Search pattern: \.get_loc\([^)]*method\s*=


2.4 .ix[] Accessor Removed

The .ix[] accessor was removed. Use .loc[] (label-based) or .iloc[] (position-based) instead.

# BEFORE (Pandas 1.x)
value = df.ix[row_label]
value = df.ix[0]

# AFTER (Pandas 2.x)
value = df.loc[row_label] # by label
value = df.iloc[0] # by position

Search pattern: \.ix\[


2.5 ExcelWriter.save() Removed

# BEFORE (Pandas 1.x)
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()

# AFTER (Pandas 2.x)
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.close()

Search pattern: writer\.save\(\)


2.6 is_monotonic Removed

# BEFORE (Pandas 1.x)
df.index.is_monotonic

# AFTER (Pandas 2.x)
df.index.is_monotonic_increasing

Search pattern: \.is_monotonic(?!_)


2.7 infer_datetime_format Parameter Removed

The infer_datetime_format parameter has been removed from pd.to_datetime() and pd.read_csv(). Pandas 2.x infers the format automatically — simply remove the parameter.

# BEFORE (Pandas 1.x)
pd.to_datetime(series, infer_datetime_format=True)
pd.read_csv(file, parse_dates=True, infer_datetime_format=True)

# AFTER (Pandas 2.x)
pd.to_datetime(series)
pd.read_csv(file, parse_dates=True)

Search pattern: infer_datetime_format


2.8 pd.np Removed

The pd.np alias for numpy has been removed. Use import numpy as np and reference np directly.

# BEFORE (Pandas 1.x)
import pandas as pd
df.replace({pd.np.nan: ''})
value = pd.np.nan

# AFTER (Pandas 2.x)
import pandas as pd
import numpy as np
df.replace({np.nan: ''})
value = np.nan

Notes:

  • This is very common in Market Adapters that use pd.np.nan as a sentinel value.
  • The fix is straightforward: add import numpy as np and replace pd.np with np.

Search pattern: pd\.np\.


3. Frequency Alias Changes

Pandas 2.2 deprecated many frequency alias strings. These currently raise FutureWarning but will become errors in a future version.

3.1 Safe to Change Now (works in both Pandas 1.x and 2.x)

These lowercase aliases are accepted by both Pandas 1.x and 2.x. Change these now:

Old AliasNew AliasMeaningAffects
"H""h"Hourresample(), date_range(), Grouper(), Timedelta()
"T""min"MinuteSame
"S""s"SecondSame
"L""ms"MillisecondSame
"U""us"MicrosecondSame
"N""ns"NanosecondSame

3.2 Do NOT Change Yet (only valid in Pandas 2.2+)

These new aliases ("ME", "QE", "YE", etc.) are not recognized by Pandas versions before 2.2 and will raise ValueError: Invalid frequency. Since the platform must support both Pandas 1.x and 2.x, keep the old aliases for now. They emit a FutureWarning in 2.2+ but still work.

Old AliasFuture AliasMeaningAction
"M""ME"Month EndKeep "M" for now
"Q""QE"Quarter EndKeep "Q" for now
"Y" or "A""YE"Year EndKeep "Y" for now
"BM""BME"Business Month EndKeep "BM" for now
"BQ""BQE"Business Quarter EndKeep "BQ" for now
"BA""BYE"Business Year EndKeep "BA" for now
"AS""YS"Year StartKeep "AS" for now
"BAS""BYS"Business Year StartKeep "BAS" for now

Aliases that are still valid (no change needed): "D" (day), "W" (week), "MS" (month start), "QS" (quarter start), "B" (business day).

3.3 Common Energyworx Examples

# BEFORE (Pandas 1.x)
df.resample("H").sum()
df.resample("15T").mean()
pd.date_range(start, end, freq="1H")
pd.Grouper(freq="1H")

# AFTER (safe for both Pandas 1.x and 2.x)
df.resample("h").sum()
df.resample("15min").mean()
pd.date_range(start, end, freq="1h")
pd.Grouper(freq="1h")

# NOTE: Keep "M" for now — "ME" only works in Pandas 2.2+
pd.Grouper(freq="M") # keep as-is (will emit FutureWarning in 2.2+)
df.resample("M").sum() # keep as-is

Compound frequencies: When a number precedes the alias, update only the letter part:

  • "1H""1h"
  • "15T""15min"
  • "30S""30s"
  • "100L""100ms"

Search patterns:

  • Hour: freq\s*=\s*["'][^"']*H["'] or resample\(\s*["'][^"']*H["']
  • Minute: freq\s*=\s*["'][^"']*T["']
  • Second: freq\s*=\s*["'][^"']*[0-9]S["'] (careful: "MS" is valid)
  • Month end: freq\s*=\s*["']M["'] (exactly "M", not "MS" or "ME")
  • Year end: freq\s*=\s*["'][^"']*[AY]["']

4. Timezone and Datetime Changes

4.1 Cannot Mix Timezone-Naive and Timezone-Aware

Pandas 2.x strictly rejects operations that mix timezone-naive and timezone-aware datetime objects. This is especially important in Energyworx because self.flow_timestamp is timezone-naive (even though it represents UTC).

# BEFORE (Pandas 1.x) — worked but was technically incorrect
start = pd.Timestamp("2024-01-01") # naive
df = self.dataframe # has UTC DatetimeIndex
result = df.loc[start:] # worked implicitly

# AFTER (Pandas 2.x) — must match timezone
start = pd.Timestamp("2024-01-01", tz="UTC")
df = self.dataframe
result = df.loc[start:]

Common fix for self.flow_timestamp:

# BEFORE
timestamp = pd.Timestamp(self.flow_timestamp)

# AFTER
timestamp = pd.Timestamp(self.flow_timestamp, tz="UTC")

Common fix for computed timestamps:

# BEFORE
edit_date = pd.Timestamp(start_date) + pd.Timedelta(hours=1)

# AFTER — localize AFTER arithmetic, or localize the input
edit_date = (pd.Timestamp(start_date) + pd.Timedelta(hours=1)).tz_localize("UTC")
# OR
edit_date = pd.Timestamp(start_date, tz="UTC") + pd.Timedelta(hours=1)

Search pattern: pd\.Timestamp\([^)]*\) where no tz= appears — then check if it's used with tz-aware data.


4.2 Using .date() on Timezone-Aware Index

Calling .date() on a timezone-aware Timestamp returns a timezone-naive datetime.date, which cannot be used for slicing a timezone-aware index. Use .normalize() instead.

# BEFORE (Pandas 1.x)
end = df.index[-1].date()
result = df.loc[:end, columns]

# AFTER (Pandas 2.x) — .normalize() gives midnight in the same timezone
end = df.index[-1].normalize()
result = df.loc[:end, columns]

Search pattern: \.index\[.*\]\.date\(\)


4.3 Timezone Comparisons

Pandas 2.x may represent UTC using different timezone objects internally. Direct comparison with pytz.UTC or datetime.timezone.utc can fail.

# BEFORE (Pandas 1.x)
import datetime as dt
assert df.index.tz == dt.timezone.utc

# AFTER (Pandas 2.x) — flexible check
assert str(df.index.tz) in ("UTC", "UTC+00:00") or df.index.tz == dt.timezone.utc

Search pattern: \.tz\s*==


4.4 pd.to_datetime() Format Inference

Pandas 2.x no longer guesses the format when a column contains mixed date formats. If your data has inconsistent formats, you must handle this explicitly.

Cross-Version Note

The format="mixed" and format="ISO8601" parameters are only available in Pandas 2.0+. If your code must run on both Pandas 1.x and 2.x, use the fallback pattern below.

Cross-version fallback pattern (recommended):

# Safe for both Pandas 1.x and 2.x
def parse_dates(series, dateformat=None, utc=False):
"""Parse dates with graduated fallback for cross-version compatibility."""
try:
return pd.to_datetime(series, format=dateformat, utc=utc)
except ValueError:
# Pandas 2.x enforces strict format matching. Try fallback strategies
# to handle minor format variations (e.g. ISO 'T' separator vs space).
fallbacks = [None, "mixed"] if dateformat else ["mixed"]
for fmt in fallbacks:
try:
return pd.to_datetime(series, format=fmt, utc=utc)
except (ValueError, TypeError):
continue
raise

This pattern works because:

  1. On Pandas 1.x, the initial call with format=dateformat usually succeeds (lenient matching), and format=None also works as a fallback.
  2. On Pandas 2.x, if strict matching fails, format=None (auto-infer) is tried first, then format="mixed" as a last resort.
  3. The format="mixed" call is only reached on Pandas 2.x where it's available.

If you only need Pandas 2.x support:

# Pandas 2.x only
dates = pd.to_datetime(series, format="mixed")
# OR for ISO 8601 strings
dates = pd.to_datetime(series, format="ISO8601")
# OR specify exact format
dates = pd.to_datetime(series, format="%Y-%m-%d %H:%M:%S")

When to use which:

  • format="mixed": Data contains multiple different formats (e.g., some rows "2024-01-01", others "01/01/2024")
  • format="ISO8601": All dates are ISO 8601 but with varying precision (e.g., some with seconds, some without)
  • Explicit format string: All dates follow the same format

Search pattern: pd\.to_datetime\( where no format= parameter is specified — check if the input data could have mixed formats.


4.5 datetime64 Resolution Changes

Pandas 2.x supports multiple datetime resolutions (datetime64[s], [ms], [us], [ns]) instead of only nanoseconds. This can cause issues when combining data with different resolutions.

# If you encounter resolution mismatch errors:
df.index = df.index.as_unit("ns") # convert to nanoseconds
# OR
series = series.dt.as_unit("ns")

5. DataFrame Operation Changes

5.1 Year-String Indexing

In Pandas 2.x, df["2024"] looks for a column named "2024" rather than filtering a DatetimeIndex by year.

# BEFORE (Pandas 1.x)
result = df["2024"]
result = df["2024"]["column_name"]

# AFTER (Pandas 2.x) — use .loc[]
result = df.loc["2024"]
result = df.loc["2024", "column_name"]

Search pattern: df\["[0-9]{4}"\]


5.2 Mixed-Type DataFrame Operations

Operations like .sum(), comparisons (<, >), and .clip() now raise errors when the DataFrame contains non-numeric columns (e.g., datetime or string columns).

# BEFORE (Pandas 1.x) — silently skipped non-numeric columns
total = df.sum()
negative_mask = df < 0

# AFTER (Pandas 2.x) — select numeric columns first
total = df.sum(numeric_only=True)
# OR
total = df[column_name].sum()

numeric_cols = df.select_dtypes(include=["number"]).columns
negative_mask = df[numeric_cols] < 0

For .clip():

# BEFORE (Pandas 1.x)
df = df.clip(lower=0)

# AFTER (Pandas 2.x)
numeric_cols = df.select_dtypes(include=["number"]).columns
df[numeric_cols] = df[numeric_cols].clip(lower=0)

Search pattern: \.sum\(\), \.clip\(, df\s*[<>] — check if the DataFrame could contain non-numeric columns.


5.3 DataFrame.columns & list Deprecated

Using the & operator between an Index and a list is deprecated. Use .intersection() instead.

# BEFORE (Pandas 1.x)
columns = df.columns & ["col1", "col2", "col3"]

# AFTER (Pandas 2.x)
columns = df.columns.intersection(["col1", "col2", "col3"])

Search pattern: \.columns\s*&\s*\[


5.4 groupby([single_column]) Key Type Changed

When grouping by a single column wrapped in a list, Pandas 2.x returns tuple keys (e.g., ("A",)) instead of scalar keys (e.g., "A"). Remove the list wrapper for single-column groupby.

# BEFORE (Pandas 1.x) — key is "A" (scalar)
for key, group in df.groupby([column_name]):
print(key) # "A"

# AFTER (Pandas 2.x) — remove list wrapper to get scalar keys
for key, group in df.groupby(column_name):
print(key) # "A"

Important: Only change this for single column groupby. Multi-column groupby should keep the list:

# Multi-column — keep the list
for key, group in df.groupby([col1, col2]):
print(key) # ("A", "B") — tuple in both versions

Search pattern: \.groupby\(\[ — check if only one column is inside the brackets.


5.5 value_counts().reset_index() Column Names Changed

The output column names from .value_counts().reset_index() have changed.

# Pandas 1.x result columns: ["index", "column_name"]
# Pandas 2.x result columns: ["column_name", "count"]

# BEFORE (Pandas 1.x)
counts = df["status"].value_counts().reset_index()
value = counts["index"][0]
count = counts["status"][0]

# AFTER (Pandas 2.x) — use positional access for version-agnostic code
counts = df["status"].value_counts().reset_index()
value_col = counts.columns[0] # the original values
count_col = counts.columns[1] # the counts
value = counts[value_col][0]
count = counts[count_col][0]

Search pattern: \.value_counts\(\)\.reset_index\(\)


5.6 Series Assignment to Filtered DataFrames

Assigning a Series to filtered DataFrame rows can fail when the index has duplicate labels. Use .values to convert to a numpy array first.

# BEFORE (Pandas 1.x)
df.loc[mask, "column"] = some_series

# AFTER (Pandas 2.x) — use .values to bypass reindexing
df.loc[mask, "column"] = some_series[mask].values

Search pattern: \.loc\[.*\]\s*=.*[^.]\bvalues\b — look for .loc[mask, col] = series without .values.


5.7 Copy-on-Write and Chained Indexing

Pandas 2.x enables Copy-on-Write by default. Chained indexing (getting a value through two successive [] operations) no longer modifies the original DataFrame.

# BEFORE (Pandas 1.x) — modified df in place
df["column"][mask] = new_value

# AFTER (Pandas 2.x) — use .loc[] for direct modification
df.loc[mask, "column"] = new_value

Search pattern: df\[["'][^"']+["']\]\[ — look for df["col"][...].


5.8 inplace=True Deprecated

The inplace parameter is deprecated on most DataFrame/Series methods. Use assignment instead.

# BEFORE (Pandas 1.x)
df.reset_index(inplace=True)
df.sort_values("col", inplace=True)
df.drop(columns=["col"], inplace=True)
df.fillna(0, inplace=True)

# AFTER (Pandas 2.x)
df = df.reset_index()
df = df.sort_values("col")
df = df.drop(columns=["col"])
df = df.fillna(0)

Search pattern: inplace\s*=\s*True


6. Timedelta Handling

Timedelta operations have several compatibility pitfalls. This section covers patterns that are safe across Pandas versions.

6.1 pd.Timedelta with Year/Month Units

Year ("Y", "y") and month ("M") units are no longer accepted because they're ambiguous (a year can be 365 or 366 days; a month can be 28–31 days).

# BEFORE (Pandas 1.x)
pd.Timedelta("100y")
pd.Timedelta("6M")

# AFTER (Pandas 2.x) — use explicit days
pd.Timedelta(days=36500) # ~100 years
pd.Timedelta(days=180) # ~6 months

# OR use DateOffset for calendar-aware offsets
pd.DateOffset(years=100)
pd.DateOffset(months=6)

Note: pd.DateOffset respects calendar months/years (e.g., adding 1 month to Jan 31 gives Feb 28), while pd.Timedelta(days=30) always adds exactly 30 days. Use whichever is correct for your business logic.

Search pattern: pd\.Timedelta\(\s*["'][0-9]+[yYmM]["']\s*\)


6.2 Converting Timedelta to Seconds or Days

The .dt.total_seconds() accessor and division by pd.Timedelta() can behave differently across versions. The safest cross-version approach uses numpy:

import numpy as np

# BEFORE (Pandas 1.x) — may fail or give wrong results in 2.x
seconds = timedelta_series.dt.total_seconds()
seconds = timedelta_series / pd.Timedelta(seconds=1)
days = timedelta_series.dt.days

# AFTER (safe across versions) — use numpy timedelta64
seconds = pd.Series(
timedelta_series.values / np.timedelta64(1, 's'),
index=timedelta_series.index
)
days = pd.Series(
timedelta_series.values / np.timedelta64(1, 'D'),
index=timedelta_series.index
).astype(int)

Key rule: Always use .values to get the underlying numpy array before dividing by np.timedelta64().

Search pattern: \.dt\.total_seconds\(\), \.dt\.days, /\s*pd\.Timedelta\(


6.3 Timedelta .astype(int) Returns Nanoseconds

When you call .astype(int) on a timedelta column, it converts to the internal representation (nanoseconds), not seconds. This can silently produce values that are 1,000,000,000x larger than expected.

# BEFORE (Pandas 1.x) — often appeared to work because of implicit conversions
interval_seconds = timedelta_column.astype(int)
# Danger: returns nanoseconds, not seconds!

# AFTER — convert to seconds explicitly first
interval_seconds = (timedelta_column.values / np.timedelta64(1, 's')).astype(int)

Search pattern: timedelta.*\.astype\(\s*int\s*\), \.astype\(\s*int\s*\) on columns that might contain timedelta values.


6.4 Extracting Values from Timedelta Columns

When you extract a single value from a timedelta column (e.g., df[column][0]), the result type depends on the pandas version and operation history. It may be:

  • A float (seconds) — use directly
  • A timedelta64 object — divide by np.timedelta64(1, 's')
  • An int64 containing nanoseconds — divide by 1,000,000,000
import numpy as np

# Robust extraction pattern
raw_value = df[column].iloc[0]
try:
numeric_value = int(raw_value)
except (TypeError, ValueError):
# It's a timedelta object — convert to seconds
value_in_seconds = int(raw_value / np.timedelta64(1, 's'))
else:
# Check if it's nanoseconds (> ~10 years in seconds)
if numeric_value > 315_360_000:
value_in_seconds = numeric_value // 1_000_000_000
else:
value_in_seconds = numeric_value

7. Migrating Flow Rules

This section walks through the most common patterns found in Energyworx Flow rules and how to update them.

7.1 Timezone Conversion Pattern

This is the most common pattern in rules — converting between UTC and local time:

class MyRule(AbstractRule):
def apply(self, **kwargs):
local_tz = self.datasource.timezone
df = self.dataframe[[self.source_column]].copy()

# Convert to local time for business logic
df = df.tz_convert(local_tz)

# MIGRATION CHECK: If you create timestamps for slicing,
# make sure they are timezone-aware
# BEFORE:
start = pd.Timestamp("2024-01-01")
# AFTER:
start = pd.Timestamp("2024-01-01", tz=local_tz)

# Process...
result = df.loc[start:]

# Convert back to UTC
result = result.tz_convert("UTC")
return RuleResult(result=result)

7.2 Resampling Pattern

Many rules aggregate data using .resample(). Update frequency aliases and check closed parameter usage:

class MyAggregationRule(AbstractRule):
def apply(self, interval="h", **kwargs):
df = self.dataframe[[self.source_column]].copy()

# BEFORE:
resampled = df.resample("H", closed="right", label="right").sum()
# AFTER:
resampled = df.resample("h", closed="right", label="right").sum()

# Note: 'closed' parameter still works on resample() — only
# pd.date_range() replaced it with 'inclusive'.

return RuleResult(result=resampled)

7.3 Date Range Generation Pattern

Rules that generate date ranges (e.g., for gap filling or profile creation):

class MyGapFillRule(AbstractRule):
def apply(self, heartbeat=3600, **kwargs):
start = self.dataframe.index[0]
end = self.dataframe.index[-1]

# BEFORE:
full_range = pd.date_range(start, end, freq="{}s".format(heartbeat), closed="right")
# AFTER:
full_range = pd.date_range(start, end, freq="{}s".format(heartbeat), inclusive="right")

return RuleResult()

7.4 Using self.flow_timestamp

The self.flow_timestamp is always timezone-naive UTC. In Pandas 2.x, you must localize it before using it with timezone-aware data:

class MyRule(AbstractRule):
def apply(self, **kwargs):
# BEFORE — worked with implicit conversion:
flow_ts = pd.Timestamp(self.flow_timestamp)
df = self.dataframe.loc[:flow_ts]

# AFTER — explicit timezone:
flow_ts = pd.Timestamp(self.flow_timestamp, tz="UTC")
df = self.dataframe.loc[:flow_ts]

return RuleResult()

7.5 Concatenating DataFrames in Rules

Rules that combine data from multiple sources:

class MyCombineRule(AbstractRule):
def prepare_context(self, other_datasource_id, **kwargs):
return {
"prepare_datasource_ids": [other_datasource_id],
"other_id": other_datasource_id,
}

def apply(self, **kwargs):
other_ds = self.prepared_datasources[self.context["other_id"]]
other_df = self.load_timeseries(other_ds.id, [self.source_column],
self.dataframe.index[0],
self.dataframe.index[-1])

# BEFORE:
combined = self.dataframe.append(other_df)
# AFTER:
combined = pd.concat([self.dataframe, other_df])

return RuleResult(result=combined)

7.6 Grouper with Frequency

Rules that group by time periods:

class MyMonthlyRule(AbstractRule):
def apply(self, **kwargs):
df = self.dataframe[[self.source_column]].copy()

# BEFORE:
monthly = df.groupby(pd.Grouper(freq="M")).sum()
hourly = df.groupby(pd.Grouper(freq="1H")).mean()
# AFTER:
monthly = df.groupby(pd.Grouper(freq="M")).sum() # keep "M" — "ME" is 2.2+ only
hourly = df.groupby(pd.Grouper(freq="1h")).mean() # "h" is safe to change now

return RuleResult(result=monthly)

7.7 Sum on DataFrames with Multiple Column Types

Rules that sum across all columns when some columns are non-numeric:

class MyValidationRule(AbstractRule):
def apply(self, **kwargs):
df = self.dataframe.copy()

# BEFORE — silently skipped datetime columns:
total = df.sum().values[0]

# AFTER — specify the column or use numeric_only:
total = df[self.source_column].sum()
# OR
total = df.sum(numeric_only=True).values[0]

return RuleResult()

8. Migrating Market Adapters

Market Adapters typically use pandas for parsing files and reshaping data. The most common migration issues are in the split() and adapt() methods.

8.1 CSV Parsing with Date Columns

class MyCSVAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
df = pd.read_csv(
io.StringIO(content),
# BEFORE:
parse_dates=["date_col"],
infer_datetime_format=True,
# AFTER — remove infer_datetime_format:
parse_dates=["date_col"],
)
# If dates have mixed formats, parse separately:
# df["date_col"] = pd.to_datetime(df["date_col"], format="mixed")

return self.normalize_csv(df.to_csv(index=False))

8.2 Groupby for Splitting by Datasource

class MyCSVAdapter(PluggableMarketAdapter):
def split(self, content, **kwargs):
df = pd.read_csv(io.StringIO(content), dtype=str)

# BEFORE — single column in list:
for datasource_id, group in df.groupby(["meter_id"]):
# datasource_id was a scalar in 1.x, tuple in 2.x
yield group.to_csv(index=False)

# AFTER — remove list wrapper for single column:
for datasource_id, group in df.groupby("meter_id"):
# datasource_id is always a scalar
yield group.to_csv(index=False)

8.3 Horizontal-to-Vertical Format Conversion

Adapters that reshape horizontal (wide) data into vertical (long) format:

class MyHorizontalAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
df = pd.read_csv(io.StringIO(content), dtype=str)
dates = pd.to_datetime(df["date"])

# Creating time intervals
intervals = pd.timedelta_range(start="0h", periods=24, freq="1h")

# BEFORE:
all_intervals = intervals.append(pd.TimedeltaIndex([pd.Timedelta(hours=25)]))
# AFTER:
all_intervals = intervals.append(pd.TimedeltaIndex([pd.Timedelta(hours=25)]))
# Note: TimedeltaIndex.append() still works, but prefer concat for DataFrames:
# all_intervals = pd.TimedeltaIndex(
# list(intervals) + [pd.Timedelta(hours=25)]
# )

return self.normalize_json(result)

8.4 Excel File Handling

class MyExcelAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
excel_file = pd.ExcelFile(io.BytesIO(content.encode()))
df = excel_file.parse("Sheet1")

# BEFORE — if writing Excel output:
writer = pd.ExcelWriter(output, engine="xlsxwriter")
df.to_excel(writer, sheet_name="Output")
writer.save() # Removed in 2.x

# AFTER:
writer = pd.ExcelWriter(output, engine="xlsxwriter")
df.to_excel(writer, sheet_name="Output")
writer.close() # Use close() instead

return self.normalize_csv(df.to_csv(index=False))

8.5 Replacing pd.np.nan in Adapters

Many adapters use pd.np.nan to replace or detect missing values. This alias was removed in Pandas 2.x.

class MyCSVAdapter(PluggableMarketAdapter):
def split(self, element, **kwargs):
import pandas as pd
# BEFORE:
# group.replace({pd.np.nan: ''}, inplace=True)

# AFTER:
import numpy as np
group = group.replace({np.nan: ''})

yield group.values.tolist()

8.6 Date Range in Adapters

Adapters that create date ranges for timeseries output:

class MyDomainAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
# BEFORE:
timestamps = pd.date_range(
start=dt.datetime(2024, 1, 1, tzinfo=pytz.UTC),
end=dt.datetime(2024, 1, 1, 23, 0, 0, tzinfo=pytz.UTC),
freq='1H' # deprecated alias
)

# AFTER:
timestamps = pd.date_range(
start=dt.datetime(2024, 1, 1, tzinfo=pytz.UTC),
end=dt.datetime(2024, 1, 1, 23, 0, 0, tzinfo=pytz.UTC),
freq='1h' # lowercase alias
)

df = pd.DataFrame({"channel_1": values}, index=timestamps)
timeseries = self.create_timeseries(df=df, datasource=ds, version=current_datetime)
self.output_timeseries(timeseries)

9. Error Message Reference

When you encounter one of these errors, use the table to find the fix:

Error MessageCauseFix
'DataFrame' object has no attribute 'append'DataFrame.append() removedUse pd.concat([df1, df2])Section 2.1
got an unexpected keyword argument 'closed'closed parameter removed from date_range()Use inclusive=Section 2.2
got an unexpected keyword argument 'method' on get_locmethod removed from get_loc()Use get_indexer()Section 2.3
'DataFrame' object has no attribute 'ix'.ix[] removedUse .loc[] or .iloc[]Section 2.4
'XlsxWriter' object has no attribute 'save'save() removedUse .close()Section 2.5
'Index' object has no attribute 'is_monotonic'is_monotonic removedUse is_monotonic_increasingSection 2.6
got an unexpected keyword argument 'infer_datetime_format'Parameter removedRemove the parameter — Section 2.7
module 'pandas' has no attribute 'np'pd.np removedUse import numpy as np and np.nanSection 2.8
FutureWarning: 'H' is deprecated and will be removed...Old frequency aliasUse 'h'Section 3
FutureWarning: 'M' is deprecated...use 'ME'...Old frequency aliasKeep 'M' for now'ME' is 2.2+ only. See Section 3
Units 'M', 'Y' and 'y' do not represent unambiguous timedelta valuesAmbiguous Timedelta unitUse days: pd.Timedelta(days=N)Section 6.1
Cannot compare tz-naive and tz-aware datetime-like objectsMixed timezone awarenessAdd tz="UTC" or .tz_localize("UTC")Section 4.1
cannot reindex on an axis with duplicate labelsSeries reindexing conflictUse .valuesSection 5.6
'DatetimeArray' with dtype datetime64[ns] does not support reduction 'sum'Summing datetime columnsUse numeric_only=TrueSection 5.2
UFuncBinaryResolutionErrorTimedelta division incompatibilityUse np.timedelta64()Section 6.2

10. Search Patterns for Your Code

Use these patterns to scan your code for potential migration issues. Each can be used with your IDE's search (regex mode) or with grep -E.

Critical — Will Error

# DataFrame/Series.append()
\.append\(

# pd.date_range with closed=
date_range\([^)]*closed\s*=

# Deprecated frequency aliases — safe to change now
(?:resample|date_range|Grouper|Timedelta)\([^)]*["'][^"']*(?<![a-zA-Z])H(?!z)["']

# Deprecated frequency aliases — DO NOT change yet (2.2+ only)
# These will emit FutureWarning but still work. Keep as-is for cross-version compat.
# freq\s*=\s*["']M["']
# freq\s*=\s*["'][AY]["']

# pd.Timedelta with year/month units
pd\.Timedelta\(\s*["'][0-9]+[yYmM]["']

# get_loc with method=
\.get_loc\([^)]*method\s*=

# .ix[] accessor
\.ix\[

# ExcelWriter.save()
\.save\(\)

# is_monotonic (without _increasing/_decreasing)
\.is_monotonic(?!_)

# infer_datetime_format
infer_datetime_format

# pd.np (e.g., pd.np.nan)
pd\.np\.

High Priority — Silent Behavior Changes

# pd.to_datetime without format (check for mixed data)
pd\.to_datetime\([^)]*\)(?![^)]*format\s*=)

# value_counts().reset_index()
\.value_counts\(\)\.reset_index\(\)

# groupby with single column in list
\.groupby\(\[[^\],]+\]\)

# .columns & list
\.columns\s*&\s*\[

# .sum() on DataFrames (check for non-numeric columns)
\.sum\(\s*\)

# Timezone-naive Timestamps used with tz-aware data
pd\.Timestamp\([^)]*\)(?![^)]*tz\s*=)

# .date() on tz-aware timestamps
\.date\(\)

# Year-string indexing
\[["'][0-9]{4}["']\]

# Chained indexing
df\[["'][^"']+["']\]\[

Medium Priority — Deprecation Warnings

# inplace=True
inplace\s*=\s*True

# Timezone comparisons
\.tz\s*==

# .dt.total_seconds() on timedelta
\.dt\.total_seconds\(\)

# timedelta .astype(int)
\.astype\(\s*int\s*\)

11. Testing Your Migration

Step-by-Step Testing Approach

  1. Search your code using the patterns from Section 10 to identify all affected lines.

  2. Apply the fixes from this guide, working through one category at a time.

  3. Run your unit tests if you have them. Pay special attention to:

    • Tests that create DataFrames with timezone-aware indices
    • Tests that use timedelta operations
    • Tests that assert on specific column names after value_counts()
  4. Test with real data on a non-production environment. Check:

    • Do resample operations produce the same number of output rows?
    • Are timezone conversions producing correct local times?
    • Are numeric aggregations (sum, mean) returning the same values?
    • Are date ranges generating the correct number of timestamps?

Common Verification Checks

# Verify resample output hasn't changed
# Run with both old and new code, compare:
assert old_result.shape == new_result.shape
assert (old_result.values == new_result.values).all()

# Verify timezone handling
assert df.index.tz is not None, "Index should be timezone-aware"

# Verify date_range output
old_range = pd.date_range(start, end, freq="1h", inclusive="both")
assert len(old_range) == expected_count

Warnings to Watch For

After migration, run your code and watch for these FutureWarning messages in the console output — each indicates something that will break in a future pandas version:

  • FutureWarning: ... is deprecated and will be removed in a future version — frequency alias needs updating
  • FutureWarning: The behavior of DataFrame.sum with axis=None is deprecated — add axis= parameter
  • FutureWarning: Downcasting object dtype arrays... — explicit dtype conversion needed
  • FutureWarning: Setting an item of incompatible dtype... — check dtype compatibility

This guide is based on the Pandas 1.x → 2.x migration of the Energyworx platform (March 2026). For the official Pandas migration documentation, see the Pandas 2.0 What's New and Pandas 2.2 What's New.